Specific codons control cellular resources and fitness

As cellular engineering progresses from simply overexpressing proteins to imparting complex phenotypes through multigene expression, judicious appropriation of cellular resources is essential. Since codon use is degenerate and biased, codons may control cellular resources at a translational level. We investigate how partitioning transfer RNA (tRNA) resources by incorporating dissimilar codon usage can drastically alter interdependence of expression level and burden on the host. By isolating the effect of individual codons’ use during translation elongation while eliminating confounding factors, we show that codon choice can trans-regulate fitness of the host and expression of other heterologous or native genes. We correlate specific codon usage patterns with host fitness and derive a coding scheme for multigene expression called the Codon Health Index (CHI, χ). This empirically derived coding scheme (χ) enables the design of multigene expression systems that avoid catastrophic cellular burden and is robust across several proteins and conditions.

. Depiction of the RFM used to evaluate steady state ribosome occupancy probabilities and protein translation rates using a published model (41,42).For extensive details on the RFM, please refer to the original publication.In this model, ribosomes initiate translation according to the λ0 rate, and then proceed to subsequent positions based on the sum of elongation times for 9 individual codons, which corresponds to the approximate footprint of a ribosome.(B) Steady state ribosome occupancies are calculated numerically in Matlab ® by solving a system of ODEs, which varies depending on the length of the gene.Protein translation rate is the rate of translation of the final position, which is simply the ribosome density (rho or ρ) multiplied by the elongation rate at the final position (λn).(C) Calculated steady state ribosome occupancy values for codon optimized CFP (CAI = 1 with reference to highly expressed E. coli genes) vs. rare codon rich YFP (CAI = 0.16 with reference to highly expressed E. coli genes) using published codon elongation times (43) illustrates higher occupancy on the rare codon rich sequence.An initiation rate of 1 s-1 was used to generate all the ribosome density plots.(D) Calculated translation rate vs. initiation rate for the optimal CFP vs. rare codon YFP indicates that poorly optimized sequences become elongation limited at lower translation initiation rates.(E and F) Since exact elongation times have only been estimated and reported variations in elongation time are relatively small, the analysis was repeated using codon elongation times with more extreme variation.Here, the analysis was repeated using 1/codon frequency in highly expressed E. coli genes as an analog of codon elongation times (see Supplementary Data S1 for details on codon times).In this case, elongation times are more extreme for rare codons, and cause the YFP sequence to be largely insensitive to translation initiation rate due to severe elongation limitation.These results illustrate that while genes with rare codons may sequester more ribosomes and limit system resources, they are predicted to vary less in expression, or be more insulated from variation in ribosome availability, since translation initiation does not limit their expression as much as codon-optimized genes.(G) and CFP/mCherry pairs (H) shown in the other panels.While different sequences exhibit moderate fluctuations in cDNA, rare codon rich sequences do not reduce mRNA transcription in the same way they consistently reduce protein translation.A condition with CFP expressed from a weak RBS is included here and indicates there is competition that occurs at the level of transcription to a certain extent, but competition at the level of translation is apparent between re-coded sequences that vary in relative protein expression, but have similar cDNA levels.n = 3 reaction replicates for each data point, all bars represent means ± SD.    ).An outlier is a value that is more than 1.5 times the interquartile range away from the bottom or top of the box.A lack of overlapping notches indicates >95% confidence in differences between medians.We observe a significantly higher CAI (median of 0.68 vs. 0.51) and lower ENC (median of 40.5 vs. 47.6) in select operons (green group) relative to the remainder, suggesting that this group has higher CUB overall, and generally uses distinct codons from the rest of the operons in E. coli.All p values represent two tailed t tests.shows that genes in the same operon tend to have similar CUB.Data above are from 773 annotated E. coli operons with at least 2 protein coding genes (total of 2466 genes).CAI refers to codon adaptation index using highly expressed E. coli genes as a reference set (see methods).Pearson's r and p value were calculated from 2466 data points.48).Here we correlate calculated CAI value (where CAI refers to codon adaptation index using highly expressed E. coli genes as a reference set) with published expression data for genes where expression level was reported (total of n = 1014 genes represented here).Note that Pearson's r is calculated based on raw protein level (as opposed to log values).

Fig. S13.
Available tRNA/anti-codons in E. coli K12 MG1655 and corresponding copy # for each.tRNA copy # data taken from the genomic tRNA database (http://gtrnadb.ucsc.edu/).Codon-anticodon interactions have been previously published ( 7).Special cases where modified tRNAs are required for codon recognition are highlighted for inosine and lysidine modifications.Generally speaking, amino acids with more than 2 available codons also have more than a single tRNA available, while most amino acids with only 2 codons share a single tRNA anticodon.In the special case of the ATA codon for isoleucine, the anticodon is identical to the methionine tRNA, and the copy # is shown only for the isoleucine CAT tRNA (denoted with *).Note that T is used here in place of U.   Fig. S16.Data related to figure 5a.RSCU differences between individual codons for CAI and χ indicates significantly larger differences for amino acids with multiple available tRNAs, suggesting tRNA competition could play a role in fitness improvements observed for χ re-coded genes.In this analysis, each amino acid is grouped based on whether there exists a single tRNA available (16 total codons) or multiple (43 total codons) (see Fig. S9, there are 10 AAs with multiple tRNA anticodons in E. coli).The RSCU difference between CAI and χ is calculated based on the difference for each codon between expected RSCU values for a perfectly adapted sequence on either scale (i.e., the difference between the RSCU value on either scale represented in Fig. 5A).An outlier is a value that is more than 1.5 times the interquartile range away from the bottom or top of the box.The observed lack of overlapping notches indicates >95% confidence in differences between medians.Statistical test is a two tailed t test.

coli genes).
There is general a poor correlation between the two scales (B) χ vs. sTAI.The correlation between χ and sTAI is better, but there is generally less differentiation between codons for sTAI (e.g., the sTAI derived RSCU for several Leu and Pro codons is the same).There is no correlation for the CFP re-codes, but there is a negative correlation for mCherry recodes.This implies that for mCherry specifically there is likely additional structure forming due to the incorporation of χ favored codons, but the trend is not generalizable as it is not observed for CFP.(C and D) Correlation between χ and GC content for CFP and mCherry re-coded sequences.In both cases there is no correlation, and GC content does not substantially vary as a function of χ value.Pearson's r, and linear regression r 2 values are calculated from n = 13 individual re-codes for each plot.  of χ and CAI values (shown here using highly expressed E. coli genes or the entire genome as a reference set) for 10 genes recoded using 3 different commercially available free codon optimization tools from IDT, Genewiz, or Genscript.The functional range of χ is between 0.6-1, so none of these sequences appear to explore the χ sequence space, falling at χ ∼ 0.8.IDT has the least bias towards either CAI scale.Genewiz appears to bias towards CUB very strongly with respect to all E. coli genes, while Genscript appears to adapt the sequence more moderately towards overall host codon usage.(B) Pareto plot of PCA of RSCU values of 61 codons for all commercially re-coded genes (30 total) along with 13 χ re-coded sequences (from Fig. 5) and 773 E. coli operons.48.8% of total variance is represented by the 1 st 3 components.(C and D).
PCA analysis showing PC1 vs. PC2 or PC1 vs. PC3, with categorically labeled points.We know from previous analysis (Fig. 3) that PC1 largely represents CAI.It appears that Genscript re-codes most closely follow the natural sequence space of highly expressed genes.IDT re-codes align the closets with χ sequences, but do not explore the χ sequence space in any meaningful way.

Fig. S25. CAI and CHI (χ) analysis of phage genes. (A)
Coliphages accessed from NCBI used in the analysis.(B) Calculated CHI (χ) and CAI (with respect to highly expressed E. coli genes) for 4311 individual E. coli genes or 860 coliphage genes.Phages generally do not have CAI higher than 0.7 and appear to tend slightly more towards higher χ than E. coli genes.(C and D) Calculated CAI (C) and χ (D) for 4311 E. coli genes and 860 coliphage genes.We observe a slightly higher but significant median CAI for native E. coli genes, and a slightly higher but significant median χ for coliphage genes.An outlier is a value that is more than 1.5 times the interquartile range away from the bottom or top of the box.A lack of overlapping notches indicates >95% confidence in differences between medians.Statistical tests are two tailed t tests.Peptides used for LC/MS analysis of protein abundance.Note that CFP and YFP could not be differentiated as they were quantified using the same peptide, and only differ by 2 amino acids.
For CFP and YFP, the peptide is 122 amino acids from the N terminus, and for mCherry it is 97 amino acids from the N terminus, thus giving confidence that the peptide quantification represents abundance of the entire protein.

Fig
Fig. S1.Ribosome flow model (RFM) of CFP and YFP.(A).Depiction of the RFM used to evaluate steady state ribosome occupancy probabilities and protein translation rates using a published model(41, 42).For extensive details on the RFM, please refer to the original publication.In this model, ribosomes initiate translation according to the λ0 rate, and then proceed to subsequent positions based on the sum of elongation times for 9 individual codons, which corresponds to the approximate footprint of a ribosome.(B) Steady state ribosome occupancies are calculated numerically in Matlab ® by solving a system of ODEs, which varies depending on the length of the gene.Protein translation rate is the rate of translation of the final position, which is simply the ribosome density (rho or ρ) multiplied by the elongation rate at

Fig. S2 .
Fig. S2.Codon usage data related to Figure 1 and the TxTL assay.Codon usage bias (CUB) reported here as frequency for all 64 codons in E. coli MRE600 vs. K12 MG1655 to analyze similarity.Codon use does not significantly differ between the two strains, indicating the tRNA profile in MRE600 is likely similar based on the same observed translational selection.Each point represents the frequency of an individual codon (n = 64) in either of the 2 strains.A full set of annotated protein coding sequences for MRE600 and MG1655 were downloaded from NCBI for this analysis (CP014197.1 and NC_000913.3,respectively).Pearson's r, and linear regression r 2 values are calculated from 64 individual codon frequency data points.

Fig. S3 .
Fig. S3.Supplementary in vitro reaction rate data related to figure 1. Relative protein synthesis rates for CFP/YFP pairs (A and B) and CFP/mCherry pairs (C and D) normalized to each gene expressed in isolation.Data represent protein expression rates from CFP with either high CAI (0.96) or low CAI (0.17) titrated against YFP or mCherry with high CAI (0.96 or 1.00 respectively).Sequences with high CAI exhibit relatively linear tradeoffs in protein expression rate, while titration of a low CAI sequence causes non-linear reduction in YFP or mCherry protein expression.n = 2 TxTL reaction replicates of single re-codes for each concentration level shown on the graphs.Solid lines are connected to sample means.

Fig
Fig. S4.Supplementary reaction rate and cDNA data related to figure 1. (A and B) Relative protein synthesis rates for CFP/YFP pairs (A) and CFP/mCherry pairs (B) normalized to each gene expressed in isolation.Protein synthesis in rare codon rich YFP or mCherry sequences (low CAI) is not affected by competition with high CAI CFP sequences.(C and D) Relative protein synthesis rates for the same CFP/YFP pairs normalized instead to the high CAI sequences for each protein.Low CAI sequences consistently express at a slower rate.(E and F) Normalized cDNA levels for each reaction measured after 2.5 h.cDNA levels for competition reactions were measured and normalized to cDNA for individual genes expressed in isolation.(G and H) Absolute concentration of measured cDNA in the same reactions for CFP/YFP pairs

Fig
Fig. S5.Data processing workflow for growth and fluorescent data related to Figure 2. (A-C) Signals are gathered by taking the area under the curve (AUC) of growth (A) quantified by OD600, or fluorescence from YFP (B) or CFP (C), using a numerical trapezoidal integrator in MATLAB.The timespan is generally defined by the maximum amount of time before any of the signals saturate in any of the samples being measured within an experiment.AUC values are represented here by bar plots (D-F).These are further processed (G) to calculate fitness values (i.e., AUC induced ÷ AUC uninduced for OD600 or YFP signal yielding Growth Fitness or Co-Expression Fitness respectively).The absolute AUC for CFP of the induced culture is used to calculate Expression Level.Growth defects as well as a reduction in YFP expression are generally observed when CFP is induced.

Fig
Fig. S6. Assay validation data related to Figure 2. All data are for the high CAI sequences, and are OD normalized.Final fluorescence (A -C), area under the curve of fluorescence data (D -F), and protein level (G -I) correlate well when validated using LC-MS/MS, (J -I) verify the relationship between area under the curve representing total protein amount measured using targeted proteomics, and the area under the curve of RFU data taken from a plate reader.This supports our assumption that fluorescence data are representative of total protein level.Taking fluorescent measurements of diluted endpoint samples indicate that fluorescent protein measurements are linear relative to protein level over a broad range (M -O).n = 3 reaction replicates for each data point, all bars represent means ± SD, correlation plots represent individual data points.Note that axes are consistent across each row.

Fig
Fig. S7.Data related to Figure 2. (A) Table of codons incorporated into rare codon re-coded CFP sequences represented in Fig.1H.For each codon, the percent use in the E. coli K12 MG1655 genome is shown, as well as known tRNA genes that can decode each respective codon.tRNA genes that are present on the complemented pBAC-RARE2 are highlighted in red.(B) A vector map of pBAC-RARE2 to over-express 12 rare tRNA genes.Individual tRNA genes present on the vector are shown in red.A complete sequence for pBAC-RARE2 can be found in Data S10.

Fig
Fig. S8.Data related to Figure 3. (A) Hierarchal clustering and analysis of RSCU values in 773 E. coli operons.Data were generated by arranging RSCU values for 773 E. coli operons with more than 1 protein coding sequence into a matrix of 61 rows (one for each codon) and 773 columns (one for each operon).Hierarchal clustering was done in Matlab®, and a unique group of operons with distinct CUB is highlighted (colored green).Note that RSCU values greater than 3 are displayed as the same color as RSCU = 3. (B) List of operons/gene names present in the highlighted group.Many genes with distinct CUB are ribosomal proteins or involved in central carbon metabolism.(C and D).Box and whisker plots of select operons from the highlighted group in green (n = 56 operons) vs. the remainder (n = 717

Fig. S9 .
Fig.S9.Data related to Figure3.Analysis of CUB in E. coli operons vs. individual gens within those operons shows that genes in the same operon tend to have similar CUB.Data above are from 773 annotated E. coli operons with at least 2 protein coding genes (total of 2466 genes).CAI refers to codon adaptation index using highly expressed E. coli genes as a reference set (see methods).Pearson's r and p value were calculated from 2466 data points.

Fig
Fig. S10.Data related to Figure 3 PCA analysis.Pareto plot showing variance explained by principal component analysis of 773 E. coli operons examined in Figure3.PC1 with 28.7% of total variance was largely found to explain CAI (where CAI refers to codon adaptation index using highly expressed E. coli genes as a reference set).

Fig. S11 .
Fig. S11.Data related to Figure 3 PCA analysis.PCA analysis of RSCU in 4,311 E. coli genes (as opposed to operons) with loadings mapped for the 10 codons with the highest contribution to variance.CAI is mapped onto individual genes and indicated in the figure legend.PCA on individual genes as opposed to operons reveals similar CUB trends across the E. coli transcriptome.

Fig
Fig. S12.CAI vs. expression level correlation data related to figure 3. Expression data are previously published (48).Here we correlate calculated CAI value (where CAI refers to codon adaptation index using highly expressed E. coli genes as a reference set) with published expression data for genes where expression level was reported (total of n = 1014 genes represented here).Note that Pearson's r is calculated based on raw protein level (as opposed to log values).

Fig. S14 .
Fig. S14.Amino acids re-coded for CFP and mCherry related to Figure4.Differences in the number of amino acids between CFP and mCherry re-coded sequences could contribute to noise in the codon sensitivity analysis dataset.Notably cysteine is underrepresented in CFP and absent in mCherry.The # refers to the total number re-coded, while the % refers to the number recoded as a percentage of all amino acids in the protein.

Fig. S15 .
Fig. S15.Method for deriving χ.Each data point representing normalized Co-Expression Fitness (as shown in Fig. 4B) for every codon re-coded in CFP and mCherry (n = 40) was quantified by taking the Euclidean distance from the origin to the codon coordinates.These raw scores were then use to create a new table of weights for calculating χ.Raw scores and calculated weights are given Data S8.

Fig. S17 .
Fig. S17.Algorithm used to create χ re-codes in Figure5.A greedy algorithm implemented in Matlab® was used to create χ re-coded CFP and mCherry sequences.As pictured, the algorithm starts with any sequence (in this case we used a randomly re-coded version of CFP or mCherry as the input to avoid any initial bias), and randomly mutates a codon to a synonymous alternative.It then evaluates whether the new sequence is closer to the objective χ value.This is repeated until the final χ score is obtained.The same algorithm can be implemented with any objective function, e.g., for CAI, GC content, ENC, etc.

Fig
Fig. S18.Correlation data from Figure 5 C-E demonstrating a strong relationship between Growth and Coexpression Fitness with χ. (A and B): Strong correlation for CFP and mCherry χ re-codes with Co-Expression Fitness (as quantified by the chromosomally integrated YFP reporter in Fig. 2).(C and D): Strong correlation for CFP and mCherry χ re-codes with Growth Fitness (as quantified by OD600).In each case, results were normalized relative to the high CAI parent control.Individual data points are shown representing n = 13 normalized means for each correlation plot.Pearson's r, and linear regression r 2 values are calculated from 13 points for each plot.

Fig. S19 .
Fig. S19.Data related to figure 6 testing χ vs. CAI sequences under different conditions.A weak RBS control was included, along with high vs. low CAI sequences, and the top performing χ = 0.95 #3 sequence using either M9 minimal media at 37°C (A), or varying temperatures between 30°C and 42°C in rich LB media (B-D).(E) qRT-PCR analysis of YFP, CFP, and mCherry fold change in cDNA levels from the 37°C LB condition.Changes in Co-expression fitness are not accounted for by RNA expression differences.Low mCherry and CFP RNA expression level for the weak RBS control is likely accounted for by higher degradation rates for poorly translated mRNA.All statistical tests are two tailed t tests, n = 3 biological replicates for each data point, all bars represent means ± SD.N.S. refers to p > 0.05.High CAI and high χ CFP and mCherry cDNA levels can be directly compared despite having different amplicons, given their similar amplification efficiency (Data S11).

Fig. S21 .
Fig. S21.Data related to Figure 7 PCA analysis.Principal component 3 explains CHI (χ) re-coded sequence variation from Figure 7 very well, as there is a very strong correlation.Pearson's r, and linear regression r 2 values are calculated from n = 13 individual data points representing single re-coded sequences.

Fig. S22 .
Fig. S22.Comparison of CHI (χ) with other codon usage bias indices.Each correlation analysis is done using Pearson's r between χ and another index based on predicted RSCU values for each codon in a perfectly adapted sequence for n = 61 codons (excluding stop codons).(A) χ vs. CAI.The 3 outlier Arg, Leu, and Pro codons are clearly favored with CAI (where CAI is in reference to highly expressed E.coli genes).There is general a poor correlation between the two scales (B) χ vs. sTAI.The correlation between χ and sTAI is better, but there is generally less differentiation between codons for sTAI (e.g., the sTAI derived RSCU for several Leu and Pro codons is the same).(C) χ vs. nTE.χ and nTE are not well correlated.Despite considering supply vs. demand, nTE appears to exaggerate differences between codons resulting in the over-avoidance of some and over-favoring of others relative to χ. Pearson correlation coefficients and linear regression r 2 values were calculated from n = 61 codon RSCU values in each plot.Dotted lines are equivalence lines between each axis.
Fig. S22.Comparison of CHI (χ) with other codon usage bias indices.Each correlation analysis is done using Pearson's r between χ and another index based on predicted RSCU values for each codon in a perfectly adapted sequence for n = 61 codons (excluding stop codons).(A) χ vs. CAI.The 3 outlier Arg, Leu, and Pro codons are clearly favored with CAI (where CAI is in reference to highly expressed E.coli genes).There is general a poor correlation between the two scales (B) χ vs. sTAI.The correlation between χ and sTAI is better, but there is generally less differentiation between codons for sTAI (e.g., the sTAI derived RSCU for several Leu and Pro codons is the same).(C) χ vs. nTE.χ and nTE are not well correlated.Despite considering supply vs. demand, nTE appears to exaggerate differences between codons resulting in the over-avoidance of some and over-favoring of others relative to χ. Pearson correlation coefficients and linear regression r 2 values were calculated from n = 61 codon RSCU values in each plot.Dotted lines are equivalence lines between each axis.

Fig. S23 .
Fig. S23.Correlations between MFE (mean free energy) or GC content of CHI (χ) re-coded sequences.(A and B) Correlation between χ and MFE for CFP and mCherry re-coded sequences.There is no correlation for the CFP re-codes, but there is a negative correlation for mCherry recodes.This implies that for mCherry specifically there is likely additional structure forming due to the incorporation of χ favored codons, but the trend is not generalizable as it is not observed for CFP.(C and D) Correlation between χ and GC content for CFP and mCherry re-coded sequences.In both cases there is no correlation, and GC content does not substantially vary as a function of χ value.Pearson's r, and linear regression r 2 values are calculated from n = 13 individual re-codes for each plot.

Fig
Fig. S24.CHI (χ) sequences relative to commercial re-coding algorithms.(A) Table of χ and CAI values(shown here using highly expressed E. coli genes or the entire genome as a reference set) for 10 genes recoded using 3 different commercially available free codon optimization tools from IDT, Genewiz, or Genscript.The functional range of χ is between 0.6-1, so none of these sequences appear to explore the χ sequence space, falling at χ ∼ 0.8.IDT has the least bias towards either CAI scale.Genewiz appears to bias towards CUB very strongly with respect to all E. coli genes, while Genscript appears to adapt the sequence more moderately towards overall host codon usage.(B) Pareto plot of PCA of RSCU values of 61 codons for all commercially re-coded genes (30 total) along with 13 χ re-coded sequences (from Fig.5) and 773 E. coli operons.48.8% of total variance is represented by the 1 st 3 components.(C and D).PCA analysis showing PC1 vs. PC2 or PC1 vs. PC3, with categorically labeled points.We know from previous analysis (Fig.3) that PC1 largely represents CAI.It appears that Genscript re-codes most closely follow the natural sequence space of highly expressed genes.IDT re-codes align the closets with χ sequences, but do not explore the χ sequence space in any meaningful way.

Fig
Fig. S26.CHI (χ) and CAI values of random CFP sequence recodes.(A)Calculated CAI (with respect to highly expressed E. coli genes) for 100,000 random sequences.The median CAI is 0.36, and even the highest outlier does not exceed 0.5.This illustrates why randomizing sequences is unlikely to affect global sequence bias towards specific codon use.(B) Calculated CHI (χ) for the same random sequences.We see a median χ score of 0.80, and the highest outlier does not exceed 0.85, again illustrating a lack of global variation in specific codon use towards higher χ from randomized sequences.

Fig. S27 .
Fig. S27.Codon use vs. codon frequency in E. coli K12 MG1655.Genomic frequency was used for nTE calculations in place of codon use (see methods) because they are very well correlated, or expression of genes in E. coli with different CUB does not appear to change codon demand in this case.Codon use was calculated as previously described (6) by taking the total codon count across all E. coli genes multiplied by their corresponding transcript abundance from a publically available dataset (GEO accession GSE59377, "rpoB_wt_lb") (66), which correspond to an exponentially growing culture of E. coli in LB broth.Pearson correlation coefficient and linear regression r 2 values are calculated from n = 64 codons.

Fig. S28 .
Fig. S28.Vector maps for expression plasmids used in this study.Each vector consists of a unique origin/copy number control mechanism based on the origin of replication and in some cases accessory proteins.Copy numbers are approximated based on known values.(A) pBAC (F1 origin, ~1 copy); (B) p5 (pSC101 origin, ~5 copies); (C) p10 (p15A origin, ~10 copies); (D) p20 (pBR322 origin, ~20 copies).Full length vector sequences can be found in Data S10.

Fig
Fig. S31.Peptides used for LC/MS analysis of protein abundance.Note that CFP and YFP could not be differentiated as they were quantified using the same peptide, and only differ by 2 amino acids.For CFP and YFP, the peptide is 122 amino acids from the N terminus, and for mCherry it is 97 amino acids from the N terminus, thus giving confidence that the peptide quantification represents abundance of the entire protein.
Quality controls for qRT-PCR results.All data were generated using mean values from 2 technical replicates.(Sheet A) In vitro TxTL reaction primer specificity data using reaction templates, indicating cq values for all primer pairs when tested with relevant templates for each pair.All pairs are sufficiently specific for analysis.(Sheet B) Primer sequences used for in vitro TxTL qRT-PCR analysis, and measured efficiency values.(Sheet C) In vivo primer specificity data using isolated cellular RNA, indicating cq values for all primer pairs when tested with relevant templates for each pair.All pairs are sufficiently specific for analysis.(Sheet D) Primer sequences used for in vivo qRT-PCR analysis, and measured efficiency values.(Sheet E) Cq values for amplicons normalized to the same starting concentration for relevant CFP and mCherry templates.The close cq values indicate that CFP high CAI and high χ re-coded sequences can be reasonably compared given they have very similar amplification efficiency.

Table S1 . Table describing background E. coli proteins monitored for abundance by LC/MS targeted proteomics.
Genes were selected to represent both higher and lower CAI levels from background E. coli host proteins.

Table S2 .
Sequence statistics for new re-coded enzymes in Figure7.Represented for each gene is the calculated codon adaptation index (CAI), codon health index (χ), effective number of codons (ENC), and % GC content.